Relationships between Molecular Complexity, Biological Activity, and Structural Diversity

نویسندگان

  • Ansgar Schuffenhauer
  • Nathan Brown
  • Paul Selzer
  • Peter Ertl
  • Edgar Jacoby
چکیده

Following the theoretical model by Hann et al. moderately complex structures are preferable lead compounds since they lead to specific binding events involving the complete ligand molecule. To make this concept usable in practice for library design, we studied several complexity measures on the biological activity of ligand molecules. We applied the historical IC50/EC50 summary data of 160 assays run at Novartis covering a diverse range of targets, among them kinases, proteases, GPCRs, and protein-protein interactions, and compared this to the background of "inactive" compounds which have been screened for 2 years but have never shown any activity in any primary screen. As complexity measures we used the number of structural features present in various molecular fingerprints and descriptors. We found generally that with increasing activity of the ligands, their average complexity also increased, and we could therefore establish a minimum number of structural features in each descriptor needed for biological activity. Especially well suited in this context were the Similog keys and circular substructure fingerprints. These are those descriptors, which also perform especially well in the identification of bioactive compounds by similarity search, suggesting that structural features encoded in these descriptors have a high relevance for bioactivity. Since the number of features correlates with the number of atoms present in the molecule, also the number of atoms serves as a reasonable complexity measure and larger molecules have, in general, higher activities. Due to the relationship between feature counts and densities on one hand and biological activity on the other, the size bias present in almost all similarity coefficients becomes especially important. Diversity selections using these coefficients can influence the overall complexity of the resulting set of molecules, which has an impact on the biological activity that they exhibit. Using sphere-exclusion based diversity selection methods, such as OptiSim together with the Tanimoto dissimilarity, the average feature count distribution of the resulting selections is shifted toward lower complexity than that of the original set, particularly when applying tight diversity constraints. This size bias reduces the fraction of molecules in the subsets having the complexity required for a high, submicromolar activity. None of the diversity selection methods studied, namely OptiSim, divisive K-means clustering, and self-organizing maps, yielded subsets covering the activity space of the IC50 summary data set better than subsets selected randomly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantitative Structure-Activity Relationship Studies of 4-Imidazolyl- 1,4-dihydropyridines as Calcium Channel Blockers

Objective(s): The structure- activity relationship of a series of 36 molecules, showing L-type calcium channel blocking was studied using a QSAR (quantitative structure–activity relationship) method. Materials and Methods: Structures were optimized by the semi-empirical AM1 quantum-chemical method which was also used to find structure-calcium channel blocking activity trends. Several types of ...

متن کامل

Genetic diversity and relationships among traits in potato genotypes using agronomic traits and molecular marker (SSR).

The molecular marker (SSR) has been used to investigate the markers associated with the agronomic traits including days to 50% flowering, tube ring time, days to maturity, plant height, the number of main stems per plant, the number of tubers per plant, dry matter content, main stem diameter, a single tuber weight, average single tuber weight, and the total yield in potato genotypes. Ten primer...

متن کامل

Molecular diversity within and between Ajowan (Carum copticum L.) populations based on inter simple sequence repeat (ISSR) markers

Study of genetic relationships is a prerequisite for plant breeding activities as well as for conservation of genetic resources. In the present study, genetic diversity among and within 15 Iranian native Ajowan(Carum copticum L.) populations were determined using inter simple sequence repeat (ISSR) markers. Twelve selected primers produced 153 discernible bands, with 93 (60.78%) being ...

متن کامل

Genetic Diversity of Bread Wheat (Triticum aestivum L.) Genotypes Using RAPD and ISSR Molecular Markers

The importance of grain cultivation especially wheat is obvious in terms of providing human and animal food and its impact on the economy of human societies. The reduction of genetic diversity in cultivars prevents increasing yields in line with rising demand and consumption. Therefore, it is necessary to improve the compatibility of them and increase their genetic extent. In the current, the g...

متن کامل

Genetic Diversity and Molecular Phylogeny of Iranian Sheep Based on Cytochrome b Gene Sequences

Phylogenetic relationships and genetic variation between two Iranian sheep breeds were analyzed using cytochrome b (cyt-b) gene sequences. The genomic DNA was isolated by salting out method and amplified cytochrome b gene using polymerase chain reaction restriction (PCR) method with a pair of primer. A partial sequence of cyt-b gene of Iranian sheep is 780 bp and contained 13 variable sites and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 46 2  شماره 

صفحات  -

تاریخ انتشار 2006